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Abstract 

We  present  a  novel  methodology  for  building  human¬ 
like  artificially  intelligent  systems.  We  take  as  a  model 
the  only  existing  systems  which  are  universally  ac¬ 
cepted  as  intelligent:  humans.  We  emphasize  building 
intelligent  systems  which  are  not  masters  of  a  single  do¬ 
main,  but,  like  humans,  are  adept  at  performing  a  vari¬ 
ety  of  complex  tasks  in  the  real  world.  Using  evidence 
from  cognitive  science  and  neuroscience,  we  suggest 
four  alternative  essences  of  intelligence  to  those  held 
by  classical  AI.  These  are  the  parallel  themes  of  devel¬ 
opment,  social  interaction,  embodiment,  and  integra¬ 
tion.  Following  a  methodology  based  on  these  themes, 
we  have  built  a  physical  humanoid  robot.  In  this  paper 
we  present  our  methodology  and  the  insights  it  affords 
for  facilitating  learning,  simplifying  the  computation 
underlying  rich  behavior,  and  building  systems  that 
can  scale  to  more  complex  tasks  in  more  challenging 
environments. 

Introduction 

An  early  development  in  the  history  of  AI  was  the  claim 
of  Newell  &  Simon  (1961)  that  humans  use  physical 
symbol  systems  to  “think”.  Over  time,  this  has  be¬ 
come  adopted  into  Artificial  Intelligence  as  an  implicit 
and  dominant  hypothesis  (see  (Brooks  1991a)  for  a  re¬ 
view)  .  Although  this  assumption  has  begun  to  soften  in 
recent  years,  a  typical  AI  system  still  relies  on  uniform, 
explicit,  internal  representations  of  capabilities  of  the 
system,  the  state  of  the  outside  world,  and  the  desired 
goals.  These  systems  are  dominated  by  search  prob¬ 
lems  to  both  access  the  relevant  facts,  and  determine 
how  to  apply  them.  Neo-classical  AI  adds  Bayesian  or 
other  probabilistic  ideas  to  this  basic  framework  (Pearl 
1988). 

The  underlying  assumption  of  these  approaches  is 
that  because  a  description  of  reasoning/behavior/- 
learning  is  possible  at  some  level,  then  that  description 
must  be  made  explicit  and  internal  to  any  mechanism 
that  carries  out  the  reasoning/behavior/learning.  The 
realization  that  descriptions  and  mechanisms  could  be 
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separated  was  one  of  the  great  breakthroughs  of  Rosen- 
schein  &  Kaelbling  (1986),  but  unfortunately  that  re¬ 
alization  has  been  largely  ignored.  This  introspective 
confusion  between  surface  observations  and  deep  struc¬ 
ture  has  led  AI  away  from  its  original  goals  of  build¬ 
ing  complex,  versatile,  intelligent  systems  and  towards 
the  construction  of  systems  capable  of  performing  only 
within  limited  problem  domains  and  in  extremely  con¬ 
strained  environmental  conditions. 

In  this  paper  we  present  a  methodology  based  on  a 
different  set  of  basis  assumptions.  We  believe  that  hu¬ 
man  intelligence  is  a  direct  result  of  four  intertwined 
attributes:  developmental  organization,  social  interac¬ 
tion,  embodiment  and  physical  coupling,  and  multi¬ 
modal  integration.  Development  forms  the  framework 
by  which  humans  successfully  acquire  increasingly  more 
complex  skills  and  competencies.  Social  interaction  al¬ 
lows  humans  to  exploit  other  humans  for  assistance, 
teaching,  and  knowledge.  Embodiment  and  physical 
coupling  allow  humans  to  use  the  world  itself  as  a  tool 
for  organizing  and  manipulating  knowledge.  Integra¬ 
tion  allows  humans  to  maximize  the  efficacy  and  accu¬ 
racy  of  complementary  sensory  and  motor  systems. 

We  have  followed  this  methodology  to  construct 
physical  humanoid  robots  (see  Figure  1).  We  design 
these  robots  to  follow  the  same  sorts  of  developmen¬ 
tal  paths  that  humans  follow,  building  new  skills  upon 
earlier  competencies.  People  interact  with  these  robots 
through  behavioral  coupling  and  direct  physical  con¬ 
tact.  The  variety  of  sensory  and  motor  systems  on  the 
robots  provide  ample  opportunity  to  confront  integra¬ 
tion  issues. 

Using  evidence  from  human  behavior,  and  early  re¬ 
sults  from  our  own  work,  we  argue  that  building  sys¬ 
tems  in  this  manner  affords  key  insights  into  how  to  sim¬ 
plify  the  computation  underlying  rich  behavior,  how  to 
facilitate  learning,  and  how  to  create  mechanisms  that 
can  scale  to  more  complex  tasks  in  more  challenging 
environments. 

The  next  section  of  this  paper  explores  the  assump¬ 
tions  about  human  intelligence  which  are  deeply  embed¬ 
ded  within  classical  AI.  The  following  sections  explain 
how  our  methodology  yields  a  plausible  approach  to 
creating  robustly  functioning  intelligent  systems,  draw- 
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Figure  1:  Our  humanoid  robot  has  undergone  many 
transformations  over  the  last  few  years.  This  is  how  it 
currently  appears. 


ing  on  examples  from  our  research.  The  final  section 
presents  an  outline  of  the  key  challenges  to  be  faced 
along  this  new  road  in  AI. 

Assumptions  about  Intelligence 

In  recent  years,  AI  research  has  begun  to  move  away 
from  the  assumptions  of  classical  AI:  monolithic  in¬ 
ternal  models,  monolithic  control,  and  general  purpose 
processing.  However,  these  concepts  are  still  prevalent 
in  much  current  work,  and  are  deeply  ingrained  in  many 
architectures  for  intelligent  systems.  For  example,  in 
the  recent  AAAI-97  Proceedings,  one  sees  a  contin¬ 
uing  interest  in  planning  ((Littman  1997,  Hauskrecht 
1997,  Boutilier  &  Brafman  1997,  Blythe  &  Veloso 
1997,  Brafman  1997))  and  representation  ((McCain  & 
Turner  1997,  Costello  1997,  Lobo,  Mendez  &  Taylor 
1997)),  which  build  on  these  assumptions. 

The  motivation  for  our  alternative  methodology 
comes  from  a  modern  understanding  of  cognitive  sci¬ 
ence  and  neuroscience,  which  counterposes  the  assump¬ 
tions  of  classical  AI,  as  described  in  the  following  sec¬ 
tions. 

Humans  have  no  full  monolithic  internal  mod¬ 
els.  There  is  evidence  that  in  normal  tasks  humans 
tend  to  minimize  their  internal  representation  of  the 
world.  Ballard,  Hayhoe  &  Pelz  (1995)  have  shown  that 
in  performing  a  complex  task,  like  building  a  copy  of 


a  display  of  blocks,  humans  do  not  build  an  internal 
model  of  the  entire  visible  scene.  By  changing  the  dis¬ 
play  while  subjects  were  looking  away,  Ballard  found 
that  subjects  noticed  only  the  most  drastic  of  changes; 
rather  than  keeping  a  complete  model  of  the  scene,  they 
instead  left  that  information  in  the  world  and  continued 
to  refer  back  to  the  scene  while  performing  the  copying 
task. 

There  is  also  evidence  that  there  are  multiple  inter¬ 
nal  representations,  which  are  not  mutually  consistent. 
For  example,  in  the  phenomena  of  blindsight,  cortically 
blind  patients  can  discriminate  different  visual  stimuli, 
but  actually  report  seeing  nothing  (Weiskrantz  1986). 
This  inconsistency  would  not  be  a  feature  of  a  single 
central  model  of  visual  space. 

These  experiments  and  many  others  like  it  (e.g. 
(Rensink,  O’Regan  &  Clark  1997,  Gazzaniga  &  LeDoux 
1978))  convincingly  demonstrate  that  humans  do  not 
construct  a  full,  monolithic  model  of  the  environment. 
Instead  humans  tend  to  only  represent  what  is  imme¬ 
diately  relevant  from  the  environment,  and  those  rep¬ 
resentations  do  not  have  full  access  to  one  another. 

Humans  have  no  monolithic  control.  Naive  in¬ 
trospection  and  observation  can  lead  one  to  believe  in 
a  neurological  equivalent  of  the  central  processing  unit 
-  something  that  makes  the  decisions  and  controls  the 
other  functions  of  the  organism.  While  there  are  un¬ 
doubtedly  control  structures,  this  model  of  a  single,  uni¬ 
tary  control  system  is  not  supported  by  evidence  from 
cognitive  science. 

One  example  comes  from  studies  of  split  brain  pa¬ 
tients  by  Gazzaniga  &  LeDoux  (1978).  These  are  pa¬ 
tients  where  the  corpus  callosum  (the  main  structure 
connecting  the  two  hemispheres  of  the  brain)  has  been 
cut.  The  patients  are  surprisingly  normal  after  the  op¬ 
eration,  but  with  deficits  that  are  revealed  by  presenting 
different  information  to  either  side  of  the  (now  uncon¬ 
nected)  brain.  Since  each  hemisphere  controls  one  side 
of  the  body,  the  experimenters  can  probe  the  behav¬ 
ior  of  each  hemisphere  independently  (for  example,  by 
observing  the  subject  picking  up  an  object  appropriate 
to  the  scene  that  they  had  viewed).  In  one  example,  a 
snow  scene  was  presented  to  the  right  hemisphere  and 
the  leg  of  a  chicken  to  the  left.  The  subject  selected  a 
chicken  head  to  match  the  chicken  leg,  explaining  with 
the  verbally  dominant  left  hemisphere  that  “I  saw  the 
claw  and  picked  the  chicken”.  When  the  right  hemi¬ 
sphere  then  picked  a  shovel  to  correctly  match  the  snow, 
the  left  hemisphere  explained  that  you  need  a  shovel 
to  “clean  out  the  chicken  shed”  (Gazzaniga  &  LeDoux 
1978,  p.148).  The  separate  halves  of  the  subject  in¬ 
dependently  acted  appropriately,  but  one  side  falsely 
explained  the  choice  of  the  other.  This  suggests  that 
there  are  multiple  independent  control  systems,  rather 
than  a  single  monolithic  one. 

Humans  are  not  general  purpose.  The  brain  is 
conventionally  thought  to  be  a  general  purpose  ma¬ 
chine,  acting  with  equal  skill  on  any  type  of  operation 


that  it  performs  by  invoking  a  set  of  powerful  rules. 
However,  humans  seem  to  be  proficient  only  in  partic¬ 
ular  sets  of  skills,  at  the  expense  of  other  skills,  often 
in  non-obvious  ways.  A  good  example  of  this  is  the 
Stroop  effect  (Stroop  1935).  When  presented  with  a 
list  of  words  written  in  a  variety  of  colors,  performance 
in  a  color  recognition  and  articulation  task  is  actu¬ 
ally  dependent  on  the  semantic  content  of  the  words; 
the  task  is  very  difficult  if  names  of  colors  are  printed 
in  non-corresponding  colors.  This  experiment  demon¬ 
strates  the  specialized  nature  of  human  computational 
processes  and  interactions. 

Even  in  the  areas  of  deductive  logic,  humans  often 
perform  extremely  poorly  in  different  contexts.  Wason 
(1966)  found  that  subjects  were  unable  to  apply  the 
negative  rule  of  if-then  inference  when  four  cards  were 
labeled  with  single  letters  and  digits.  However,  with 
additional  context — labeling  the  cards  such  that  they 
were  understandable  as  names  and  ages — subjects  could 
easily  solve  exactly  the  same  problem. 

Further,  humans  often  do  not  use  subroutine- like 
rules  for  making  decisions.  They  are  often  more  emo¬ 
tional  than  rational,  and  there  is  evidence  that  this 
emotional  content  is  an  important  aspect  of  decision 
making  (Damasio  1994). 

Essences  of  Human  Intelligence 

Since  humans  are  vastly  complex  systems,  we  do  not 
expect  to  duplicate  every  facet  of  their  operation.  How¬ 
ever,  we  must  be  very  careful  not  to  ignore  aspects  of 
human  intelligence  solely  because  they  appear  complex. 
Classical  and  neo-classical  AI  tends  to  ignore  or  avoid 
these  complexities,  in  an  attempt  to  simplify  the  prob¬ 
lem  (Minsky  &  Papert  1970).  We  believe  that  many  of 
these  discarded  elements  are  essential  to  human  intel¬ 
ligence  and  that  they  actually  simplify  the  problem  of 
creating  human-like  intelligence. 

Development  Humans  are  not  born  with  complete 
reasoning  systems,  complete  motor  systems,  or  even 
complete  sensory  systems.  Instead,  they  undergo  a 
process  of  development  where  they  are  able  to  perform 
more  difficult  tasks  in  more  complex  environments  en 
route  to  the  adult  state.  This  is  a  gradual  process,  in 
which  earlier  forms  of  behavior  disappear  or  are  modi¬ 
fied  into  more  complex  types  of  behavior.  The  adaptive 
advantage  of  the  earlier  forms  appears  to  be  that  they 
prepare  and  enable  more  advanced  forms  of  behavior  to 
develop  within  the  situated  context  they  provide.  The 
developmental  psychology  literature  abounds  with  ex¬ 
amples  of  this  phenomenon.  For  instance,  the  work  of 
Diamond  (1990)  shows  that  infants  between  five  and 
twelve  months  of  age  progress  through  a  number  of 
distinct  phases  in  the  development  of  visually  guided 
reaching.  In  one  reaching  task,  the  infant  must  re¬ 
trieve  a  toy  from  inside  a  transparent  box  with  only  one 
open  side.  In  this  progression,  infants  in  later  phases 
consistently  demonstrate  more  sophisticated  reaching 
strategies  to  retrieve  the  toy  in  more  challenging  scenar¬ 


ios.  As  the  infant’s  reaching  competency  develops,  later 
stages  incrementally  improve  upon  the  competency  af¬ 
forded  by  the  previous  stage. 

Social  Interaction  Human  infants  are  extremely  de¬ 
pendent  on  their  caregivers,  relying  upon  them  not  only 
for  basic  necessities  but  also  as  a  guide  to  their  develop¬ 
ment.  The  presence  of  a  caregiver  to  nurture  the  child 
as  it  grows  is  essential.  This  reliance  on  social  con¬ 
tact  is  so  integrated  into  our  species  that  it  is  hard  to 
imagine  a  completely  asocial  human.  However,  severe 
developmental  disorders  sometimes  give  us  a  glimpse 
of  the  importance  of  social  contact.  One  example  is 
autism.  Autistic  children  often  appear  completely  nor¬ 
mal  on  first  examination;  they  look  normal,  have  good 
motor  control,  and  seem  to  have  normal  perceptual  abil¬ 
ities.  However,  their  behavior  is  completely  strange  to 
us,  in  part  because  they  do  not  recognize  or  respond  to 
normal  social  cues  (Baron-Cohen  1995).  They  do  not 
maintain  eye  contact,  recognize  pointing  gestures,  or 
understand  simple  social  conventions.  Even  the  most 
highly  functioning  autistics  are  severely  disabled  in  our 
society. 

Embodiment  Perhaps  the  most  obvious,  and  most 
overlooked,  aspect  of  human  intelligence  is  that  it  is 
embodied.  Humans  are  embedded  in  a  complex,  noisy, 
constantly  changing  environment.  There  is  a  direct 
physical  coupling  between  action  and  perception,  with¬ 
out  the  need  for  an  intermediary  representation.  This 
coupling  makes  some  tasks  simple  and  other  tasks  more 
complex.  By  exploiting  the  properties  of  the  complete 
system,  certain  seemingly  complex  tasks  can  be  made 
computationally  simple.  For  example,  when  putting  a 
jug  of  milk  in  the  refrigerator,  you  can  exploit  the  pen¬ 
dulum  action  of  your  arm  to  move  the  milk  (Greene 
1982).  The  swing  of  the  jug  does  not  need  to  be  explic¬ 
itly  planned  or  controlled,  since  it  is  the  natural  behav¬ 
ior  of  the  system.  Instead  of  having  to  plan  the  whole 
motion,  the  system  only  has  to  modulate,  guide  and 
correct  the  natural  dynamics.  For  an  embodied  system, 
internal  representations  can  be  ultimately  grounded 
in  sensory-motor  interactions  with  the  world  (Lakoff 
1987). 

Integration  Humans  have  the  capability  to  receive 
an  enormous  amount  of  information  from  the  world. 
Visual,  auditory,  somatosensory,  and  olfactory  cues  are 
all  processed  simultaneously  to  provide  us  with  our  view 
of  the  world.  However,  there  is  evidence  that  the  sen¬ 
sory  modalities  are  not  independent;  stimuli  from  one 
modality  can  and  do  influence  the  perception  of  stim¬ 
uli  in  another  modality.  Churchland,  Ramachandran 
&  Sejnowski  (1994)  describe  an  experiment  illustrating 
how  audition  can  cause  illusory  visual  motion.  A  fixed 
square  and  a  dot  located  to  its  left  are  presented  to 
the  observer.  Without  any  sound  stimuli,  the  blink¬ 
ing  of  the  dot  does  not  result  in  any  perception  of  mo¬ 
tion.  If  a  tone  is  alternately  played  in  the  left  and  right 
ears,  with  the  left  ear  tone  coinciding  with  the  dot  pre- 


Figure  2:  We  have  built  two  active  vision  heads,  similar 
in  design  to  Cog’s  head.  On  top  is  a  desktop  version 
with  a  1  DOF  neck,  and  below  a  head  with  actuators 
to  include  facial  expressions. 


sentation,  there  is  an  illusory  perception  of  back  and 
forth  motion  of  the  dot,  with  the  square  acting  as  a  vi¬ 
sual  occluder.  Vision  can  cause  auditory  illusions  too, 
such  as  the  McGurk  effect  (Cohen  &  Massaro  1990). 
These  studies  demonstrate  that  humans’  perception  of 
their  senses  cannot  be  treated  as  completely  indepen¬ 
dent  processes. 

Methodology 

Our  methodology — exploring  themes  of  development, 
social  interaction,  physical  interaction  and  integration 
while  building  real  robots — is  motivated  by  two  ideas. 
First,  we  believe  that  these  themes  are  important  as¬ 
pects  of  human  intelligence.  Second,  from  an  engi¬ 
neering  perspective,  these  themes  make  the  problems 
of  building  human  intelligence  easier. 

Embodiment 

A  principle  tenet  of  our  methodology  is  to  build  and 
test  real  robotic  systems.  We  believe  that  building 
human-like  intelligence  requires  human-like  interaction 


with  the  world  (Brooks  &  Stein  1994).  Humanoid  form 
is  important  to  allow  humans  to  interact  with  the  robot 
in  a  natural  way.  In  addition  we  believe  that  building  a 
real  system  is  computationally  less  complex  than  sim¬ 
ulating  such  a  system.  The  effects  of  gravity,  friction, 
and  natural  human  interaction  are  obtained  for  free, 
without  any  computation. 

Our  humanoid  robot,  named  Cog  and  shown  in  Fig¬ 
ure  1,  approximates  a  human  being  from  the  waist  up 
with  twenty-one  degrees-of- freedom  (DOF)  and  a  vari¬ 
ety  of  sensory  systems.  The  physical  structure  of  the 
robot,  with  movable  torso,  arms,  neck  and  eyes  gives  it 
human-like  motion,  while  the  sensory  systems  (visual, 
auditory,  vestibular,  and  proprioceptive)  provide  rich 
information  about  the  robot  and  its  immediate  envi¬ 
ronment.  These  together  present  many  opportunities 
for  interaction  between  the  robot  and  humans. 

In  addition  to  the  full  humanoid,  we  have  also  devel¬ 
oped  active  head  platforms,  of  similar  design  to  Cog’s 
head,  as  shown  in  Figure  2  (Scassellati  1998a).  These 
self-contained  systems  allow  us  to  concentrate  on  vari¬ 
ous  issues  in  close  human-machine  interaction,  includ¬ 
ing  face  detection,  imitation,  emotional  display  and 
communication,  etc.  (Scassellati  19986,  Ferrell  1998c). 

Development 

Building  systems  developmentally  facilitates  learning 
both  by  providing  a  structured  decomposition  of  skills 
and  by  gradually  increasing  the  complexity  of  the  task 
to  match  the  competency  of  the  system. 

Bootstrapping  Development  is  an  incremental  pro¬ 
cess.  As  it  proceeds,  prior  structures  and  their  behav¬ 
ioral  manifestations  place  important  constraints  on  the 
later  structures  and  proficiencies.  The  earlier  forms 
bootstrap  the  later  structures  by  providing  subskills 
and  knowledge  that  can  be  re-used.  By  following  the 
developmental  progression,  the  learning  difficulties  at 
each  stage  are  minimized.  Within  our  group,  Scassellati 
(1996)  discusses  how  a  humanoid  robot  might  acquire 
basic  social  competencies  through  this  sort  of  develop¬ 
mental  methodology. 

The  work  of  Marjanovic,  Scassellati  &  Williamson 
(1996)  applied  bootstrapping  techniques  to  our  robot, 
coordinating  visual  and  motor  systems  by  learning  to 
point  toward  a  visual  target.  A  map  used  for  a  saccad- 
ing  behavior  (visual/eye-movement  map),  was  reused  to 
learn  a  reaching  behavior  (visual/arm-movement  map). 
The  learned  saccadic  behavior  bootstrapped  the  reach¬ 
ing  behavior,  reducing  the  complexity  of  the  overall 
learning  task.  Other  examples  of  developmental  learn¬ 
ing  that  we  have  explored  can  be  found  in  (Ferrell  1996). 

Gradual  increase  in  complexity  The  developmen¬ 
tal  process,  starting  with  a  simple  system  that  grad¬ 
ually  becomes  more  complex  allows  efficient  learning 
throughout  the  whole  process.  For  example,  infants  are 
born  with  low  acuity  vision  which  simplifies  the  visual 
input  they  must  process.  The  infant’s  visual  perfor- 


mance  develops  in  step  with  their  ability  to  process  the 
influx  of  stimulation  (Johnson  1993).  The  same  is  true 
for  the  motor  system.  Newborn  infants  do  not  have  in¬ 
dependent  control  over  each  degree  of  freedom  of  their 
limbs,  but  through  a  gradual  increase  in  the  granular¬ 
ity  of  their  motor  control  they  learn  to  coordinate  the 
full  complexity  of  their  bodies.  A  process  where  the 
acuity  of  both  sensory  and  motor  systems  are  gradu¬ 
ally  increased  significantly  reduces  the  difficulty  of  the 
learning  problem  (Thelen  &  Smith  1994). 

To  further  facilitate  learning,  the  gradual  increase  in 
internal  complexity  associated  with  development  should 
be  accompanied  by  a  gradual  increase  in  the  complexity 
of  the  external  world.  For  an  infant,  the  caregiver  bi¬ 
ases  how  learning  proceeds  by  carefully  structuring  and 
controlling  the  complexity  of  the  environment.  This 
approach  is  in  stark  contrast  to  most  machine  learn¬ 
ing  methods,  where  the  robot  learns  in  a  usually  hos¬ 
tile  environment,  and  the  bias,  instead  of  coming  from 
the  robots’  interaction  with  the  world,  is  included  by 
the  designer.  We  believe  that  gradually  increasing  the 
complexity  of  the  environment  makes  learning  easier 
and  more  robust. 

By  exploiting  a  gradual  increase  in  complexity  both 
internal  and  external,  while  reusing  structures  and  in¬ 
formation  gained  from  previously  learned  behaviors,  we 
hope  to  be  able  to  learn  increasingly  sophisticated  be¬ 
haviors.  We  believe  that  these  methods  will  allow  us  to 
construct  systems  which  do  scale  autonomously  (Ferrell 
&  Kemp  1996). 

Social  Interaction 

Building  social  skills  into  an  artificial  intelligence  pro¬ 
vides  not  only  a  natural  means  of  human-machine  in¬ 
teraction  but  also  a  mechanism  for  bootstrapping  more 
complex  behavior.  Our  research  program  has  investi¬ 
gated  social  interaction  both  as  a  means  for  bootstrap¬ 
ping  and  as  an  instance  of  developmental  progression. 

Bootstrapping  Social  interaction  can  be  a  means  to 
facilitate  learning.  New  skills  may  be  socially  transfered 
from  caregiver  to  infant  through  mimicry  or  imitation, 
through  direct  tutelage,  or  by  means  of  scaffolding,  in 
which  a  more  able  adult  manipulates  the  infant’s  in¬ 
teractions  with  the  environment  to  foster  novel  abili¬ 
ties.  Commonly  scaffolding  involves  reducing  distrac¬ 
tions,  marking  the  task’s  critical  attributes,  reducing 
the  number  of  degrees  of  freedom  in  the  target  task, 
and  enabling  the  subject  to  experience  the  end  or  out¬ 
come  before  the  infant  is  cognitively  or  physically  able 
of  seeking  and  attaining  it  for  herself  (Wood,  Bruner  & 
Ross  1976). 

We  are  currently  engaged  in  work  studying  boot¬ 
strapping  new  behaviors  from  social  interactions.  One 
research  project  focuses  on  building  a  robotic  system 
capable  of  learning  communication  behaviors  in  a  so¬ 
cial  context  where  the  human  provides  various  forms  of 
scaffolding  to  facilitate  the  robot’s  learning  task  (Ferrell 
19986).  The  system  uses  expressive  facial  gestures  (see 


Figure  2)  as  feedback  to  the  caregiver  (Ferrell  1998a). 
The  caregiver  can  then  regulate  the  complexity  of  the 
social  interaction  to  optimize  the  robot’s  learning  rate. 

Development  of  social  interaction  The  social 
skills  required  to  make  use  of  scaffolding  are  complex. 
Infants  acquire  these  social  skills  through  a  develop¬ 
mental  progression  (Hobson  1993).  One  of  the  earliest 
precursors  is  the  ability  to  share  attention  with  the  care¬ 
giver.  This  ability  can  take  many  forms,  from  the  recog¬ 
nition  of  a  pointing  gesture  to  maintaining  eye  contact. 

In  our  work,  we  have  also  examined  social  interac¬ 
tion  from  this  developmental  perspective.  One  research 
program  focuses  on  a  developmental  implementation  of 
shared  attention  mechanisms  based  upon  normal  child 
development,  developmental  models  of  autism1,  and 
on  models  of  the  evolutionary  development  of  social 
skills  (Scassellati  1996).  The  first  step  in  this  devel¬ 
opmental  progression  is  recognition  of  eye  contact.  Hu¬ 
man  infants  are  predisposed  to  attend  to  socially  rele¬ 
vant  stimuli,  such  as  faces  and  objects  that  have  human¬ 
like  motion.  The  system  is  currently  capable  of  de¬ 
tecting  faces  in  its  peripheral  vision,  saccading  to  the 
faces,  and  finding  eyes  within  its  foveal  vision  (Scas¬ 
sellati  19986).  This  developmental  chain  has  also  pro¬ 
duced  a  simple  imitation  behavior;  the  head  will  mimic 
yes/no  head  nods  of  the  caregiver  (Scassellati  1998c). 

Physical  Coupling 

Another  aspect  of  our  methodology  is  to  exploit  inter¬ 
action  and  tight  coupling  between  the  robot  and  its  en¬ 
vironment  to  give  complex  behavior,  to  facilitate  learn¬ 
ing,  and  to  avoid  the  use  of  explicit  models.  Our  sys¬ 
tems  are  physically  coupled  with  the  world  and  operate 
directly  in  that  world  without  any  explicit  representa¬ 
tions  of  it  (Brooks  1986,  Brooks  19916).  There  are  rep¬ 
resentations,  or  accumulations  of  state,  but  these  only 
refer  to  the  internal  workings  of  the  system;  they  are 
meaningless  without  interaction  with  the  outside  world. 
The  embedding  of  the  system  within  the  world  enables 
the  internal  accumulations  of  state  to  provide  useful 
behavior  (this  was  the  fundamental  approach  taken  by 
Ashby  (1960)  contemporaneously  with  the  development 
of  early  AI) . 

One  example  of  such  a  scheme  is  implemented  to 
control  our  robot’s  arms.  As  detailed  in  (Williamson 
1998a,  Williamson  19986),  a  set  of  self-adaptive  oscil¬ 
lators  are  used  to  drive  the  joints  of  the  arm.  Each 
joint  is  actuated  by  a  single  oscillator,  using  proprio¬ 
ceptive  information  at  that  joint  to  alter  the  frequency 
and  phase  of  the  joint  motion.  There  are  no  connec¬ 
tions  between  the  oscillators,  except  indirectly  through 

1Some  researchers  believe  that  the  missing  mecha¬ 
nisms  of  shared  attention  may  be  central  to  autism  disor¬ 
ders  (Baron-Cohen  1995).  In  comparison  to  other  mental 
retardation  and  developmental  disorders  (like  Williams  and 
Downs  Syndromes),  the  deficiencies  of  autism  in  this  area 
are  quite  specific  (Karmiloff-Smith,  Klima,  Bellugi,  Grant 
&  Baron-Cohen  1995). 


the  physical  structure  of  the  arm.  Without  using  any 
kinematic  or  dynamical  models,  this  simple  scheme  has 
been  used  for  a  variety  of  different  coordinated  tasks, 
including  turning  a  crank  and  playing  with  a  slinky  toy. 
The  interaction  between  the  arm  and  the  environment 
enables  the  oscillators  to  generate  useful  behavior.  For 
example,  without  the  slinky  to  connect  the  two  arms, 
they  are  uncoordinated,  but  when  the  arms  are  coupled 
through  the  slinky,  the  oscillators  tune  into  the  dynam¬ 
ics  of  the  motion  and  coordinated  behavior  is  achieved. 
In  all  cases,  there  is  no  central  controller,  and  no  model¬ 
ing  of  the  arms  or  the  environment;  the  behavior  of  the 
whole  system  comes  from  the  coupling  of  the  arm  and 
controller  dynamics.  Other  researchers  have  built  simi¬ 
lar  systems  which  exhibit  complex  behavior  with  either 
simple  or  no  control  (McGeer  1990,  Schaal  &  Atkeson 
1993)  by  exploiting  the  system  dynamics. 

Sensory  Integration 

Sensory  Integration  Simplifies  Computation 

Some  tasks  are  best  suited  for  particular  sensory  modal¬ 
ities.  Attempting  to  perform  the  task  using  only  one 
modality  is  sometimes  awkward  and  computationally 
intensive.  Utilizing  the  complementary  nature  of  sepa¬ 
rate  modalities  results  in  a  reduction  of  overall  compu¬ 
tation.  We  have  implemented  several  mechanisms  on 
Cog  that  use  multimodal  integration  to  aid  in  increas¬ 
ing  performance  or  developing  competencies. 

For  example,  Peskin  &  Scassellati  (1997)  imple¬ 
mented  a  system  that  stabilized  images  from  a  mov¬ 
ing  camera  using  vestibular  feedback.  Rather  than  at¬ 
tempting  to  model  the  camera  motion,  or  to  predict 
motion  effects  based  on  efference  copy,  the  system  mim¬ 
ics  the  human  vestibular-ocular  reflex  (VOR)  by  com¬ 
pensating  for  camera  motion  through  learned  feedback 
from  a  set  of  rate  gyroscopes.  By  integrating  two  sen¬ 
sory  systems,  we  can  achieve  better  performance  than 
traditional  image  processing  methods,  while  using  less 
computation. 

Sensory  Integration  Facilitates  Learning  By  in¬ 
tegrating  different  sensory  modalities  we  can  exploit  the 
complex  nature  of  stimuli  to  facilitate  learning.  For  ex¬ 
ample,  objects  that  make  noise  often  move.  This  cor¬ 
relation  can  be  exploited  to  facilitate  perception.  In 
our  work,  we  have  investigated  primarily  the  relation¬ 
ship  between  vision  and  audition  in  learning  to  orient 
toward  stimuli. 

We  can  characterize  this  relationship  by  examining 
developmental  evidence.  Wertheimer  (1961)  has  shown 
that  vision  and  audition  interact  from  birth;  even  ten- 
minute-old  children  will  turn  their  eyes  toward  an  au¬ 
ditory  cue.  This  interaction  between  the  senses  contin¬ 
ues  to  develop,  indeed  related  investigations  with  young 
owls  have  determined  that  visual  stimuli  greatly  affect 
the  development  of  sound  localization.  With  a  constant 
visual  bias  from  prisms,  owls  adjusted  their  sound  lo¬ 
calization  to  match  the  induced  visual  errors  (Knudsen 
&  Knudsen  1985). 


Irie  (1997)  built  an  auditory  system  for  our  robot 
that  utilizes  visual  information  to  train  auditory  lo¬ 
calization;  the  visually-determined  location  of  a  sound 
source  with  a  corresponding  motion  is  used  to  train  an 
auditory  spatial  map.  This  map  is  then  used  to  orient 
the  head  toward  the  object.  This  work  highlights  not 
only  the  development  of  sensory  integration,  but  also 
the  simplification  of  computational  requirements  that 
can  be  obtained  through  integration. 

Challenges  for  Intelligent  Systems 

This  new  approach  to  designing  and  studying  intelli¬ 
gent  systems  leaves  us  with  a  new  set  of  challenges  to 
overcome.  Here  are  the  key  questions  which  we  must 
now  answer. 

•  Scaling  and  Development:  What  learning  structures 

and  organizational  principles  will  allow  us  to  design 

successful  developmental  systems? 

•  Social  Interaction: 

—  How  can  the  system  learn  to  communicate  with 
humans? 

—  What  attributes  of  an  infant  must  the  machine  em¬ 
ulate  in  order  to  elicit  caregiver  behavior  from  hu¬ 
mans? 

—  What  drives,  motivations  and  emotions  are  neces¬ 
sary  for  the  system  to  communicate  effectively  with 
humans? 

•  Physical  Coupling: 

—  How  can  a  system  scale  the  complexity  of  its  coor¬ 
dinated  motion  while  still  exploiting  the  dynamics? 

—  How  can  the  system  combine  newly  learned  spatial 
skills  with  previously  learned  spatial  skills?  How 
is  that  memory  to  be  organized? 

—  How  can  the  system  use  previously  learned  skills 
in  new  contexts  and  configurations? 

•  Integration: 

—  How  can  a  conglomerate  of  subsystems,  each  with 
different  or  conflicting  goals  and  behaviors,  act 
with  coherence  and  stability? 

—  At  what  scale  should  we  emulate  the  biological  or¬ 
ganism,  keeping  in  mind  engineering  constraints? 

—  What  are  good  measures  of  performance  for  inte¬ 
grated  systems  which  develop,  interact  socially  and 
are  physically  coupled  with  the  world? 

Conclusion 

We  have  reported  on  a  work  in  progress  which  incorpo¬ 
rates  a  new  methodology  for  achieving  Artificial  Intel¬ 
ligence.  We  have  built  a  humanoid  robot  that  operates 
and  develops  in  the  world  in  ways  that  are  similar  to 
the  ways  in  which  human  infants  operate  and  develop. 
We  have  demonstrated  learning  to  saccade,  learning  to 
correlate  auditory  and  occular  coordinate  systems,  self 
adapting  vestibular-ocular  systems,  coordinated  neck 
and  ocular  systems,  learning  of  hand-eye  coordination, 


localization  of  multiple  sounds  streams,  variable  stiff¬ 
ness  arms  that  interact  safely  with  people,  arm  control 
based  on  biological  models  of  invertebrate  spinal  cir¬ 
cuits,  adaptive  arm  control  that  tunes  into  to  subtle 
physical  cues  from  the  environment,  face  detection,  eye 
detection  to  find  gaze  direction,  coupled  human  robot 
interaction  that  is  a  precursor  to  caregiver  scaffolding 
for  learning,  large  scale  touch  sensitive  body  skin,  and 
multi-fingered  hands  that  learn  to  grasp  objects  based 
on  self-categorized  stiffness  properties.  These  are  com¬ 
ponents  for  higher  level  behaviors  that  we  are  begin¬ 
ning  to  put  together  using  models  of  shared  attention, 
emotional  coupling  between  robot  and  caregiver,  and 
developmental  models  of  human  infants. 

We  have  thus  chosen  to  approach  AI  from  a  different 
perspective,  in  the  questions  we  ask,  the  problems  we 
try  to  solve,  and  the  methodology  and  inspiration  we 
use  to  achieve  our  goals.  Our  approach  has  led  us  not 
only  to  formulate  the  problem  in  the  context  of  general 
human-level  intelligence,  but  to  redefine  the  essences 
of  that  intelligence.  Traditional  AI  work  has  sought 
to  narrowly  define  a  problem  and  apply  abstract  rules 
to  its  solution.  We  claim  that  our  goal  of  creating  a 
learning,  scalable,  intelligent  system,  with  competencies 
similar  to  human  beings,  is  altogether  relevant  in  trying 
to  solve  a  broad  class  of  real-world  situated  problems. 
We  further  believe  that  the  principles  of  development, 
social  interaction,  physical  coupling  to  the  environment, 
and  integration  are  essential  to  guide  us  towards  our 
goal. 
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